What is stable diffusion model
Stable diffusion models are state-of-art deep generative models in the machine learning field. They are capable of generating photo-realistic images from latent space and have been gaining significant attention in the past several years. Many famous generative AI websites DALL-E 2, Midjourney, etc are based on the technique of stable diffusion models
What is the goal of this notebook
In this notebook, I tried to built an *unconditional* denoising diffusion model from scratch and used the model to generate anime faces.
Anime faces generated from latent space:
My codes and results are shown in below.
References:
Denoising Diffusion Probabilistic Models
Improved Denoising Diffusion Probabilistic Models
How stable diffusion work
Github:
Sung-Soo Kim's implementation
Niels Rogge & Kashif Rasul's implementation
I used Anime_Faces_Dataset_128 scraped from kaggle which consists of around 100K high quality anime faces images (128*128 - white background - same pose)
import torch
import torchvision
import matplotlib.pyplot as plt
import numpy as np
from torch import nn
from torch.optim import Adam, SGD, AdamW
from torchvision import transforms
from torch.utils.data import DataLoader
from torchinfo import summary
from PIL import Image
from tqdm import tqdm
""" Plots some samples from the dataset """
nsamples = 20
ncols = 5
data = torchvision.datasets.ImageFolder(root=r'/Users/mwchang/Downloads/faces')
indices = torch.randint(low=0, high=len(data), size=(nsamples,))
samples = torch.utils.data.Subset(data, indices)
plt.figure(figsize=(15,15))
for i, img in enumerate(samples):
plt.subplot(int(nsamples/ncols) + 1, ncols, i + 1)
plt.imshow(img[0])
del(data)
In a denoise diffusion model, the noise-corrupted input data is treated as the initial state. The model then applies a series of denoising steps, where each step reduces the noise level and refines the data.
During the denoising process, the model estimates the posterior distribution of the clean data given the current state and noise level. It leverages the inherent patterns and structure in the data to guide the denoising process. By iteratively applying these denoising steps, the model gradually removes the noise and converges towards a cleaner representation of the original data.